Automatic Natural Language Style Classification and Transformation
نویسندگان
چکیده
Style is an integral part of natural language in written, spoken or machine generated forms. Humans have been dealing with style in language since the beginnings of language itself, but computers and machine processes have only recently begun to process natural language styles. Automatic processing of styles poses two interrelated challenges: classification and transformation. There have been recent advances in corpus classification, automatic clustering and authorship attribution along many dimensions but little work directly related to writing styles directly and even less in transformation. In this paper we examine relevant literature to define and operationalize a notion of “style” which we employ to designate style markers usable in classification machines. A measurable reading of these markers also helps guide style transformation algorithms. We demonstrate the concept by showing a detectable stylistic shift in a sample piece of text relative to a target corpus. We present ongoing work in building a comprehensive style recognition and transformation system and discuss our results.
منابع مشابه
Automatic Transcription of Lecture Speech using Language Model Based on Speaking-Style Transformation of Proceeding Texts
For language modeling of spontaneous speech recognition, we propose a style transformation approach, which transforms written texts to a spoken-style language model. Since these two styles are largely different and thus direct transformation is difficult, we cascade two transformation methods; rule-based transformation to rewrite written-style texts to intermediate “verbatim” texts, and statist...
متن کاملTransformation-Based Learning for Automatic Translation from HTML to XML
Format tags implicitly represent content information in the same ambiguous, context dependent manner that words represent semantics in natural language. Translation from format to content markup shares many characteristics with tagging and parsing tasks in computational linguistics. The transformation-based learning (TBL) paradigm has recently been applied to numerous computational linguistics ...
متن کاملLEXICALIZING COMPUTATIONAL STYLISTICS For Language Learner Feedback
Computational stylistics refers informally to a collection of tasks within computational linguistics that deal with the style—as opposed to the semantic content—of natural language. The most famous of these tasks is perhaps authorship attribution (Stamatatos et al., 2001), which uses statistical variations in word choice to select the most likely from a fixed set of potential authors. Though ap...
متن کاملVerb Clustering for Brazilian Portuguese
Levin-style classes which capture the shared syntax and semantics of verbs have proven useful for many Natural Language Processing (NLP) tasks and applications. However, lexical resources which provide information about such classes are only available for a handful of worlds languages. Because manual development of such resources is extremely time consuming and cannot reliably capture domain va...
متن کاملMercure: Towards an Automatic E-mail Follow-up System
This paper discusses the design and the approach we have developed in order to deal effectively with customer e-mails sent to a corporation. We first present the current state of the art and then make the point that natural language tools are needed in order to deal effectively with the rather informal style encountered in the e-mails. In our project, called Mercure, we have explored three comp...
متن کامل